Spftn: A self-paced fine-tuning network for segmenting objects in weakly labelled videos
Abstract
Object segmentation in weakly labelled videos is an interesting yet challenging task, which aims at learning to perform category-specific video object segmentation by only using video-level tags.
Existing works in this research area might still have some limitations, e.g.:
• Lack of effective DNN-based learning frameworks
• Under-exploring the context information
• Requiring to leverage the unstable negative video collection
These limitations prevent them from obtaining more promising performance.
Methodology
To this end, we propose a novel self-paced fine-tuning network (SPFTN)-based framework, which could learn to explore the context information within the video frames and capture adequate object semantics without using the negative videos.
To perform weakly supervised learning based on the deep neural network, we make the earliest effort to integrate the self-paced learning regime and the deep neural network into a unified and compatible framework, leading to the self-paced fine-tuning network.
This integration enables the model to automatically select training samples from easy to hard, progressively improving the learning quality while avoiding negative samples.
Experimental Results
Comprehensive experiments on the large-scale YouTube-Objects and DAVIS datasets demonstrate that the proposed approach achieves superior performance as compared with:
• Other state-of-the-art methods
• The baseline networks and models
The results validate that by integrating self-paced learning with deep neural networks, SPFTN can effectively learn to segment objects in weakly labelled videos without requiring negative video collections, while fully exploiting context information for improved segmentation accuracy.